Add reading grib files for stats #33

andleuth · 2024-07-19T14:51:39Z

add reading grib files for stats computation
use pinned environment for CI
make conda necessary for environment creation

requirements/dev-requirements.yml

util/model_output_parser.py

stelliom · 2024-07-25T16:08:02Z

Hi @andleuth,

I looked at the code and added some comments (see above). I also tried to follow the usual ICON+probtest manual workflow (see here) with this branch, however, I am encountering some errors when I try to generate the stats, both with (for the GPU results to check) and without (for the CPU references) the --no-ensemble option. The error(s) I am getting is the following:

[mch_gpu_mixed] mstellio@balfrin-ln002:/scratch/mch/mstellio/test-pt-grib/gpu> python ../externals/probtest/probtest.py stats --no-ensemble --stats-file-name stats_gpu.csv --model-output-dir $SCRATCH/test-pt-grib/gpu/experiments/mch_icon-ch1
initialized logger with level INFO
reading config file from probtest.json
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ecCodes provides no latitudes/longitudes for gridType='unstructured_grid'
Traceback (most recent call last):
  File "/scratch/mch/mstellio/test-pt-grib/externals/probtest/miniconda/envs/probtest/lib/python3.12/site-packages/xarray/core/dataset.py", line 1475, in _construct_dataarray
    variable = self._variables[name]
               ~~~~~~~~~~~~~~~^^^^^^
KeyError: 'step'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/scratch/mch/mstellio/test-pt-grib/externals/probtest/miniconda/envs/probtest/lib/python3.12/site-packages/xarray/core/dataset.py", line 1574, in __getitem__
    return self._construct_dataarray(key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/mch/mstellio/test-pt-grib/externals/probtest/miniconda/envs/probtest/lib/python3.12/site-packages/xarray/core/dataset.py", line 1477, in _construct_dataarray
    _, name, variable = _get_virtual_variable(self._variables, name, self.sizes)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/mch/mstellio/test-pt-grib/externals/probtest/miniconda/envs/probtest/lib/python3.12/site-packages/xarray/core/dataset.py", line 210, in _get_virtual_variable
    raise KeyError(key)
KeyError: 'step'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/scratch/mch/mstellio/test-pt-grib/gpu/../externals/probtest/probtest.py", line 57, in <module>
    cli()
  File "/scratch/mch/mstellio/test-pt-grib/externals/probtest/miniconda/envs/probtest/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/mch/mstellio/test-pt-grib/externals/probtest/miniconda/envs/probtest/lib/python3.12/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/scratch/mch/mstellio/test-pt-grib/externals/probtest/miniconda/envs/probtest/lib/python3.12/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/mch/mstellio/test-pt-grib/externals/probtest/miniconda/envs/probtest/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/mch/mstellio/test-pt-grib/externals/probtest/miniconda/envs/probtest/lib/python3.12/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/mch/mstellio/test-pt-grib/externals/probtest/engine/stats.py", line 100, in stats
    create_stats_dataframe(
  File "/scratch/mch/mstellio/test-pt-grib/externals/probtest/engine/stats.py", line 15, in create_stats_dataframe
    df = df_from_file_ids(file_id, input_dir, file_specification)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/mch/mstellio/test-pt-grib/externals/probtest/util/dataframe_ops.py", line 135, in df_from_file_ids
    var_df = read_input_file(
             ^^^^^^^^^^^^^^^^
  File "/scratch/mch/mstellio/test-pt-grib/externals/probtest/util/dataframe_ops.py", line 70, in read_input_file
    var_dfs = file_parser(label, file_name, specification)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/mch/mstellio/test-pt-grib/externals/probtest/util/model_output_parser.py", line 90, in parse_grib
    sub_df = dataframe_from_ncfile(
             ^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/mch/mstellio/test-pt-grib/externals/probtest/util/model_output_parser.py", line 236, in dataframe_from_ncfile
    time = xarray_ds[time_dim].values
           ~~~~~~~~~^^^^^^^^^^
  File "/scratch/mch/mstellio/test-pt-grib/externals/probtest/miniconda/envs/probtest/lib/python3.12/site-packages/xarray/core/dataset.py", line 1576, in __getitem__
    raise KeyError(
KeyError: "No variable named 'step'. Variables on the dataset include ['number', 'time', 'isobaricInhPa', 'z']"

I am not sure if you already encountered something similar? If yes, can you recall what the solution was? This reminds me of the issue you had with the steps' units.

PS: I tested this on mch_icon-ch1 with 1-hourly output.

util/model_output_parser.py

andleuth · 2024-07-29T12:36:09Z

Hi @stelliom , the error occured because I did not activate the conda environment before setting the ECCODES_DEFINITION_PATH. Thanks for your hint!

util/model_output_parser.py

jonasjucker · 2024-08-06T07:20:33Z

A small comment:

It would be nice to have that new feature covered with a unittest.

setup_env.sh

stelliom · 2024-08-07T15:40:00Z

Now, I propose these as next steps:

Duplicate the icon-ch1_small experiment, but change the filetype of the output to GRIB, such that we have an icon-ch1_small experiment (NetCDF output) and an icon-ch1_small_grb experiment (GRIB output).
Add this new icon-ch1_small_grb experiment to BuildBot, basically by looking in all places in ICON (BB related scripts) where the original icon-ch1_small is present and mimicking that code for icon-ch1_small_grb. However, keep in mind that for icon-ch1_small_grb the file_ids will need to be different (i.e. GRIB and not NetCDF)
Add the activation and deactivation of probtest's Conda environment before and after calls to probtest in ICON (mostly BB scripts probably), respectively.
Once the above are done, open a MR in icon-nwp and comment runBB on that MR. This way we should be able to see if the test is being run correctly by BB. After this is working we can start looking into the reference generation part.

…nto reading_grib_files

requirements/environment.yml

stelliom · 2024-10-28T14:38:06Z

setup_env.sh

+
+
+# Setting ECCODES_DEFINITION_PATH:
+${CONDA} activate ${ENV_NAME}


Was this also tested with the -m option? I.e. where CONDA=mamba.

hmm, I don't think so. so maybe better to remove this option?

Mmmh, I am not sure if we ever use mamba for probtest. I guess that is coming from the blueprint, right?

Judging from line 53 of this file, ${CONDA} activate/deactivate is probably an issue when using mamba, so we should at least change those commands to just conda activate/deactivate. The rest is probably fine, judging from the code above.

However, if we don't care about mamba here, then I agree that we could actually remove it entirely.

yes, it comes from the blueprint. Then lets keep it lean and remove the option.

stelliom · 2024-10-28T14:40:48Z

templates/ICON.jinja

@@ -11,6 +11,7 @@
            "member_type": "{{member_type}}",
            "factor": 5,
            "file_specification": [{
+                "GRIB": {"format": "GRIB", "time_dim": "step", "horizontal_dims": ["values"], "var_excl": ["tlon", "tlat", "vlon", "vlat", "ulon", "ulat", "h", "slor", "anor", "isor", "sdor"], "fill_value_key": "_FillValue"},


I think at some point we may want to come up with a more flexible solution for var_excl, so that we don't have to hardcode it here. But I think this is not in the scope of this PR 👍

I am open for suggestions. Maybe worth to raise an issue?

I cannot remember now what the actual error was if those variables were not excluded, but at the time we were pretty sure that we did not need to test those values, since they seemed to be dimensions/coordinates. If I remember correctly from what Andrea told me, in GRIB there is no distinction between variables and dimensions/coordinates as in NetCDF, do you know if this is true?

I think that the names of these variables are probably not going to be the same in every GRIB file, so the current solution might be very specific to our needs and therefore pretty fragile in case someone wants to use probtest outside ICON or even with different experiments. If we could find a way to distinguish between variables and these "dimensional fields" within the code, then we could simply fill the short_names_excl list using such criterion. I am not an expert with GRIB files, but I guess these variables (judging from their names) probably do not change with time, therefore they might not have a time dimension. Maybe we could leverage this and use it as a criterion to exclude them from probtest, assuming we are never interested in testing variables that are constant in time, which seems reasonable to me. Another uglier option could be to use whatever error those variables were causing to our advantage by handling it in a try-statement and adding the name of the variable to the short_names_excl list in case of error.

I think we could raise an issue for this, but maybe it is not super urgent. I guess it depends how much the list of excluded variables changes from file to file and how many different people will attempt to use probtest of GRIB files.

Sorry for the wall of text 😁

Thanks for the wall of text :)
I am not a grib expert, but I would assume it is true. We had a similar problem with the fof files, and there we just included files that have the specific horizontal dimensions, which excludes all the dimension variables.
But I think for now I would just put it all in a desperate issue.

Yes, I agree! Will you open the issue or should I do it?

tests/engine/test_stats.py

stelliom · 2024-10-28T15:16:54Z

util/dataframe_ops.py

@@ -185,6 +192,58 @@ def unify_time_index(fid_dfs):
    return fid_dfs_out


+def adjust_time_index(fid_dfs):


Why is this required as part of this PR? I.e. for reading GRIB files.

well. It is a workaround. I am not 100% satisfied with it. So there is unify_time_index which converts the times to integer timesteps. This works not for the case of grib files that are written every 10s. So I added this adjust_time_index routine which will fix it and just overwrites with the time steps.

I see, ok. However, I am not sure I understand why the fact that GRIB files are written every 10s causes an issue, can you not just replace the times with integer timesteps as well as for NetCDF?

util/model_output_parser.py

stelliom

Hi @huppd,

I looked at your changes compared to what I already reviewed for Andrea. I added a couple of comments/doubts, but in general this looks fine to me. However, I have to admit I am not super sure about the Python tests, since I am not an expert on that.

I will approve the PR now, since I will be on vacation for the rest of the week. This way you will still be able to merge in my absence. Otherwise, I will come back to this next Monday 🙂

Cheers,
Mikael

huppd · 2024-11-07T14:40:43Z

Hi @stelliom
thanks a lot for the valuable review.
I hope I could address your remarks properly.
Cheers,
Daniel

Andrea Leuthard added 2 commits July 18, 2024 13:08

added grib_parser

3ec8155

eccodes_definition in setup_env

ae3a747